Search CORE

67 research outputs found

Protein structure similarity from principle component correlation analysis

Author: Chou James
Wong Stephen TC
Zhou Xiaobo
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Owing to rapid expansion of protein structure databases in recent years, methods of structure comparison are becoming increasingly effective and important in revealing novel information on functional properties of proteins and their roles in the grand scheme of evolutionary biology. Currently, the structural similarity between two proteins is measured by the root-mean-square-deviation (RMSD) in their best-superimposed atomic coordinates. RMSD is the golden rule of measuring structural similarity when the structures are nearly identical; it, however, fails to detect the higher order topological similarities in proteins evolved into different shapes. We propose new algorithms for extracting geometrical invariants of proteins that can be effectively used to identify homologous protein structures or topologies in order to quantify both close and remote structural similarities. RESULTS: We measure structural similarity between proteins by correlating the principle components of their secondary structure interaction matrix. In our approach, the Principle Component Correlation (PCC) analysis, a symmetric interaction matrix for a protein structure is constructed with relationship parameters between secondary elements that can take the form of distance, orientation, or other relevant structural invariants. When using a distance-based construction in the presence or absence of encoded N to C terminal sense, there are strong correlations between the principle components of interaction matrices of structurally or topologically similar proteins. CONCLUSION: The PCC method is extensively tested for protein structures that belong to the same topological class but are significantly different by RMSD measure. The PCC analysis can also differentiate proteins having similar shapes but different topological arrangements. Additionally, we demonstrate that when using two independently defined interaction matrices, comparison of their maximum eigenvalues can be highly effective in clustering structurally or topologically similar proteins. We believe that the PCC analysis of interaction matrix is highly flexible in adopting various structural parameters for protein structure comparison

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Semi-supervised drug-protein interaction prediction from heterogeneous biological spaces

Author: Wong Stephen TC
Wu Ling-Yun
Xia Zheng
Zhou Xiaobo
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

biological space

CiteSeerX

Crossref

Springer - Publisher Connector

PubMed Central

Recommended from our members

Boosting alternating decision trees modeling of disease trait information

Author: Lin Jennifer
Liu Kuang-Yu
Wong Stephen TC
Zhou Xiaobo
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

We applied the alternating decision trees (ADTrees) method to the last 3 replicates from the Aipotu, Danacca, Karangar, and NYC populations in the Problem 2 simulated Genetic Analysis Workshop dataset. Using information from the 12 binary phenotypes and sex as input and Kofendrerd Personality Disorder disease status as the outcome of ADTrees-based classifiers, we obtained a new quantitative trait based on average prediction scores, which was then used for genome-wide quantitative trait linkage (QTL) analysis. ADTrees are machine learning methods that combine boosting and decision trees algorithms to generate smaller and easier-to-interpret classification rules. In this application, we compared four modeling strategies from the combinations of two boosting iterations (log or exponential loss functions) coupled with two choices of tree generation types (a full alternating decision tree or a classic boosting decision tree). These four different strategies were applied to the founders in each population to construct four classifiers, which were then applied to each study participant. To compute average prediction score for each subject with a specific trait profile, such a process was repeated with 10 runs of 10-fold cross validation, and standardized prediction scores obtained from the 10 runs were averaged and used in subsequent expectation-maximization Haseman-Elston QTL analyses (implemented in GENEHUNTER) with the approximate 900 SNPs in Hardy-Weinberg equilibrium provided for each population. Our QTL analyses on the basis of four models (a full alternating decision tree and a classic boosting decision tree paired with either log or exponential loss function) detected evidence for linkage (Z ≥ 1.96, p < 0.01) on chromosomes 1, 3, 5, and 9. Moreover, using average iteration and abundance scores for the 12 phenotypes and sex as their relevancy measurements, we found all relevant phenotypes for all four populations except phenotype b for the Karangar population, with suggested subgroup structure consistent with latent traits used in the model. In conclusion, our findings suggest that the ADTrees method may offer a more accurate representation of the disease status that allows for better detection of linkage evidence

Harvard University - DASH

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Context based mixture model for cell phase identification in automated fluorescence microscopy

Author: King Randy W
Wang Meng
Wong Stephen TC
Zhou Xiaobo
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: Automated identification of cell cycle phases of individual live cells in a large population captured via automated fluorescence microscopy technique is important for cancer drug discovery and cell cycle studies. Time-lapse fluorescence microscopy images provide an important method to study the cell cycle process under different conditions of perturbation. Existing methods are limited in dealing with such time-lapse data sets while manual analysis is not feasible. This paper presents statistical data analysis and statistical pattern recognition to perform this task. RESULTS: The data is generated from Hela H2B GFP cells imaged during a 2-day period with images acquired 15 minutes apart using an automated time-lapse fluorescence microscopy. The patterns are described with four kinds of features, including twelve general features, Haralick texture features, Zernike moment features, and wavelet features. To generate a new set of features with more discriminate power, the commonly used feature reduction techniques are used, which include Principle Component Analysis (PCA), Linear Discriminant Analysis (LDA), Maximum Margin Criterion (MMC), Stepwise Discriminate Analysis based Feature Selection (SDAFS), and Genetic Algorithm based Feature Selection (GAFS). Then, we propose a Context Based Mixture Model (CBMM) for dealing with the time-series cell sequence information and compare it to other traditional classifiers: Support Vector Machine (SVM), Neural Network (NN), and K-Nearest Neighbor (KNN). Being a standard practice in machine learning, we systematically compare the performance of a number of common feature reduction techniques and classifiers to select an optimal combination of a feature reduction technique and a classifier. A cellular database containing 100 manually labelled subsequence is built for evaluating the performance of the classifiers. The generalization error is estimated using the cross validation technique. The experimental results show that CBMM outperforms all other classifies in identifying prophase and has the best overall performance. CONCLUSION: The application of feature reduction techniques can improve the prediction accuracy significantly. CBMM can effectively utilize the contextual information and has the best overall performance when combined with any of the previously mentioned feature reduction techniques

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Conditional random pattern model for copy number aberration detection

Author: Chang Chung-Che
Huang Wanting
Li Fuhai
Wong Stephen TC
Zhou Xiaobo
Publication venue: BioMed Central
Publication date: 01/04/2010
Field of study

Abstract Background DNA copy number aberration (CNA) is very important in the pathogenesis of tumors and other diseases. For example, CNAs may result in suppression of anti-oncogenes and activation of oncogenes, which would cause certain types of cancers. High density single nucleotide polymorphism (SNP) array data is widely used for the CNA detection. However, it is nontrivial to detect the CNA automatically because the signals obtained from high density SNP arrays often have low signal-to-noise ratio (SNR), which might be caused by whole genome amplification, mixtures of normal and tumor cells, experimental noise or other technical limitations. With the reduction in SNR, many false CNA regions are often detected and the true CNA regions are missed. Thus, more sophisticated statistical models are needed to make the CNAs detection, using the low SNR signals, more robust and reliable. Results This paper presents a conditional random pattern (CRP) model for CNA detection where much contextual cues are explored to suppress the noise and improve CNA detection accuracy. Both simulated and the real data are used to evaluate the proposed model, and the validation results show that the CRP model is more robust and reliable in the presence of noise for CNA detection using high density SNP array data, compared to a number of widely used software packages. Conclusions The proposed conditional random pattern (CRP) model could effectively detect the CNA regions in the presence of noise.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

High content image analysis for human H4 neuroglioma cells exposed to CuO nanoparticles

Author: Huang Xudong
Li Fuhai
Ma Jinwen
Wong Stephen TC
Zhou Xiaobo
Zhu Jinmin
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background High content screening (HCS)-based image analysis is becoming an important and widely used research tool. Capitalizing this technology, ample cellular information can be extracted from the high content cellular images. In this study, an automated, reliable and quantitative cellular image analysis system developed in house has been employed to quantify the toxic responses of human H4 neuroglioma cells exposed to metal oxide nanoparticles. This system has been proved to be an essential tool in our study. Results The cellular images of H4 neuroglioma cells exposed to different concentrations of CuO nanoparticles were sampled using IN Cell Analyzer 1000. A fully automated cellular image analysis system has been developed to perform the image analysis for cell viability. A multiple adaptive thresholding method was used to classify the pixels of the nuclei image into three classes: bright nuclei, dark nuclei, and background. During the development of our image analysis methodology, we have achieved the followings: (1) The Gaussian filtering with proper scale has been applied to the cellular images for generation of a local intensity maximum inside each nucleus; (2) a novel local intensity maxima detection method based on the gradient vector field has been established; and (3) a statistical model based splitting method was proposed to overcome the under segmentation problem. Computational results indicate that 95.9% nuclei can be detected and segmented correctly by the proposed image analysis system. Conclusion The proposed automated image analysis system can effectively segment the images of human H4 neuroglioma cells exposed to CuO nanoparticles. The computational results confirmed our biological finding that human H4 neuroglioma cells had a dose-dependent toxic response to the insult of CuO nanoparticles.</p

Crossref

Harvard University - DASH

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Using iterative cluster merging with improved gap statistics to perform online phenotype discovery in the context of high-throughput RNAi screens

Author: Bakal Chris
Li Fuhai
Perrimon Norbert
Sun Youxian
Wong Stephen TC
Yin Zheng
Zhou Xiaobo
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The recent emergence of high-throughput automated image acquisition technologies has forever changed how cell biologists collect and analyze data. Historically, the interpretation of cellular phenotypes in different experimental conditions has been dependent upon the expert opinions of well-trained biologists. Such qualitative analysis is particularly effective in detecting subtle, but important, deviations in phenotypes. However, while the rapid and continuing development of automated microscope-based technologies now facilitates the acquisition of trillions of cells in thousands of diverse experimental conditions, such as in the context of RNA interference (RNAi) or small-molecule screens, the massive size of these datasets precludes human analysis. Thus, the development of automated methods which aim to identify novel and biological relevant phenotypes online is one of the major challenges in high-throughput image-based screening. Ideally, phenotype discovery methods should be designed to utilize prior/existing information and tackle three challenging tasks, i.e. restoring pre-defined biological meaningful phenotypes, differentiating novel phenotypes from known ones and clarifying novel phenotypes from each other. Arbitrarily extracted information causes biased analysis, while combining the complete existing datasets with each new image is intractable in high-throughput screens. Results Here we present the design and implementation of a novel and robust online phenotype discovery method with broad applicability that can be used in diverse experimental contexts, especially high-throughput RNAi screens. This method features phenotype modelling and iterative cluster merging using improved gap statistics. A Gaussian Mixture Model (GMM) is employed to estimate the distribution of each existing phenotype, and then used as reference distribution in gap statistics. This method is broadly applicable to a number of different types of image-based datasets derived from a wide spectrum of experimental conditions and is suitable to adaptively process new images which are continuously added to existing datasets. Validations were carried out on different dataset, including published RNAi screening using <it>Drosophila </it>embryos [Additional files <supplr sid="S1">1</supplr>, <supplr sid="S2">2</supplr>], dataset for cell cycle phase identification using HeLa cells [Additional files <supplr sid="S1">1</supplr>, <supplr sid="S3">3</supplr>, <supplr sid="S4">4</supplr>] and synthetic dataset using polygons, our methods tackled three aforementioned tasks effectively with an accuracy range of 85%–90%. When our method is implemented in the context of a <it>Drosophila </it>genome-scale RNAi image-based screening of cultured cells aimed to identifying the contribution of individual genes towards the regulation of cell-shape, it efficiently discovers meaningful new phenotypes and provides novel biological insight. We also propose a two-step procedure to modify the novelty detection method based on one-class SVM, so that it can be used to online phenotype discovery. In different conditions, we compared the SVM based method with our method using various datasets and our methods consistently outperformed SVM based method in at least two of three tasks by 2% to 5%. These results demonstrate that our methods can be used to better identify novel phenotypes in image-based datasets from a wide range of conditions and organisms. Conclusion We demonstrate that our method can detect various novel phenotypes effectively in complex datasets. Experiment results also validate that our method performs consistently under different order of image input, variation of starting conditions including the number and composition of existing phenotypes, and dataset from different screens. In our findings, the proposed method is suitable for online phenotype discovery in diverse high-throughput image-based genetic and chemical screens.</p

Crossref

Harvard University - DASH

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Bioluminescence imaging reveals inhibition of tumor cell proliferation by Alzheimer's amyloid β protein

Author: Cui Kemi
Kesari Santosh
O'Brien Megan
Wong Kelvin K
Wong Stephen TC
Xia Weiming
Xu Xiaoyin
Zhao Hong
Zhu Jinmin
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: Cancer and Alzheimer's disease (AD) are two seemingly distinct diseases and rarely occur simultaneously in patients. To explore molecular determinants differentiating pathogenic routes towards AD or cancer, we investigate the role of amyloid β protein (Aβ) on multiple tumor cell lines that are stably expressing luciferase (human glioblastoma U87; human breast adenocarcinoma MDA-MB231; and mouse melanoma B16F). Results: Quantification of the photons emitted from the MDA-MB231 or B16F cells revealed a significant inhibition of cell proliferation by the conditioning media (CM) derived from amyloid precursor protein (APP) over-expressing cells. The inhibition of U87 cells was observed only after the media was conditioned for longer than 2 days with APP over-expressing cells. Conclusion: Our results suggest that Aβ plays an inhibitory role in tumor cell proliferation; this effect could depend on the type of tumor cells and amount of Aβ

Crossref

Harvard University - DASH

Springer - Publisher Connector

PubMed Central

3D cell nuclei segmentation based on gradient flow tracking

Author: Guo Lei
Holley Scott
Li Gang
Liu Tianming
Mara Andrew
Nie Jingxin
Tarokh Ashley
Wong Stephen TC
Publication venue: BioMed Central|1
Publication date: 01/01/2007
Field of study

Abstract Background Reliable segmentation of cell nuclei from three dimensional (3D) microscopic images is an important task in many biological studies. We present a novel, fully automated method for the segmentation of cell nuclei from 3D microscopic images. It was designed specifically to segment nuclei in images where the nuclei are closely juxtaposed or touching each other. The segmentation approach has three stages: 1) a gradient diffusion procedure, 2) gradient flow tracking and grouping, and 3) local adaptive thresholding. Results Both qualitative and quantitative results on synthesized and original 3D images are provided to demonstrate the performance and generality of the proposed method. Both the over-segmentation and under-segmentation percentages of the proposed method are around 5%. The volume overlap, compared to expert manual segmentation, is consistently over 90%. Conclusion The proposed algorithm is able to segment closely juxtaposed or touching cell nuclei obtained from 3D microscopy imaging with reasonable accuracy.</p

Crossref

Harvard University - DASH

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Novel Modeling of Cancer Cell Signaling Pathways Enables Systematic Drug Repositioning for Distinct Breast Cancer Metastases

Author: Angel Rodriguez
Buti
Ding Ren
Fuhai Li
Gordon
Guangxu Jin
Hong Zhao
Iorio
Jenny Chang
Kemi Cui
Peikai Chen
Rodriguez
Solomon Wong
Stephen TC Wong
Timothy Liu
Yubo Fan
Zhao
Publication venue: 'American Association for Cancer Research (AACR)'
Publication date
Field of study

Crossref